9 research outputs found

    Video Image Segmentation and Object Detection Using Markov Random Field Model

    Get PDF
    In this dissertation, the problem of video object detection has been addressed. Initially this is accomplished by the existing method of temporal segmentation. It has been observed that the Video Object Plane (VOP) generated by temporal segmentation has a strong limitation in the sense that for slow moving video object it exhibits either poor performance or fails. Therefore, the problem of object detection is addressed in case of slow moving video objects and fast moving video objects as well. The object is detected while integrating the spatial segmentation as well as temporal segmentation. In order to take care of the temporal pixel distribution in to account for spatial segmentation of frames, the spatial segmentation of frames has been formulated in spatio-temporal framework. A compound MRF model is proposed to model the video sequence. This model takes care of the spatial and the temporal distributions as well. Besides taking in to account the pixel distributions in temporal directions, compound MRF models have been proposed to model the edges in the temporal direction. This model has been named as edgebased model. Further more the differences in the successive images have been modeled by MRF and this is called as the change based model. This change based model enhanced the performance of the proposed scheme. The spatial segmentation problem is formulated as a pixel labeling problem in spatio-temporal framework. The pixel labels estimation problem is formulated using Maximum a posteriori (MAP) criterion. The segmentation is achieved in supervised mode where we have selected the model parameters in a trial and error basis. The MAP estimates of the labels have been obtained by a proposed Hybrid Algorithm is devised by integrating that global as well as local convergent criterion. Temporal segmentation of frames have been obtained where we do not assume to have the availability of reference frame. The spatial and temporal segmentation have been integrated to obtain the Video Object Plane (VOP) and hence object detection In order to reduce the computational burden an evolutionary approach based scheme has been proposed. In this scheme the first frame is segmented and segmentation of other frames are obtained using the segmentation of the first frame. The computational burden is much less as compared to the previous proposed scheme. Entropy based adaptive thresholding scheme is proposed to enhance the accuracy of temporal segmentation. The object detection is achieved by integrating spatial as well as the improved temporal segmentation results

    C3D and Localization Model for Locating and Recognizing the Actions from Untrimmed Videos (Student Abstract)

    No full text
    In this article, we proposed a technique for action localization and recognition from long untrimmed videos. It consists of C3D CNN model followed by the action mining using the localization model, where the KNN classifier is used. We segment the video into expressible sub-action known as action-bytes. The pseudo labels have been used to train the localization model, which makes the trimmed videos untrimmed for action-bytes. We present experimental results on the recent benchmark trimmed video dataset “Thumos14”

    Early Detection of Diabetic Retinopathy From Big Data In Hadoop Framework

    No full text
    In this article, we have designed a fast and reliable Diabetic Retinopathy (DR) detection technique in Hadoop framework, which can identify the early signs of diabetes from eye retinal images. In the proposed scheme the retinal images are classified into five categories: No Diabetic Retinopathy (DR), Mild DR, Moderate DR, Severe DR and Proliferative DR. The proposed scheme follows three distinct steps for classification of the diabetic retinopathy images: feature extraction, feature reduction and image classification. In the initial stage of the algorithm, the Histogram of Oriented Gradients (HOG) is used as a feature descriptor to represent each of the Diabetic Retinopathy images. Principal Component Analysis (PCA) is used for dimensional reduction of HOG features. In the final stage of the algorithm, K-Nearest Neighbors (KNN) classifier is used, in a distributed environment, to classify the retinal images to different classes. Experiments have been carried out on a substantial number of high-resolution retinal images taken under an assortment of imaging conditions. Both left and right eye images are provided for every subject. To handle such large datasets, Hadoop platform is used with MapReduce and Mahout framework for programming. The results obtained by the proposed scheme are compared with some of the close competitive state-of-the-art techniques. The proposed technique is found to provide better results than the existing approaches in terms of some standard performance evaluation measures

    Two-Streams: Dark and Light Networks with Graph Convolution for Action Recognition from Dark Videos (Student Abstract)

    No full text
    In this article, we propose a two-stream action recognition technique for recognizing human actions from dark videos. The proposed action recognition network consists of an image enhancement network with Self-Calibrated Illumination (SCI) module, followed by a two-stream action recognition network. We have used R(2+1)D as a feature extractor for both streams with shared weights. Graph Convolutional Network (GCN), a temporal graph encoder is utilized to enhance the obtained features which are then further fed to a classification head to recognize the actions in a video. The experimental results are presented on the recent benchmark ``ARID" dark-video database

    Do We Need a New Large-Scale Quality Assessment Database for Generative Inpainting Based 3D View Synthesis? (Student Abstract)

    No full text
    The advancement in Image-to-Image translation techniques using generative Deep Learning-based approaches has shown promising results for the challenging task of inpainting-based 3D view synthesis. At the same time, even the current 3D view synthesis methods often create distorted structures or blurry textures inconsistent with surrounding areas. We analyzed the recently proposed algorithms for inpainting-based 3D view synthesis and observed that these algorithms no longer produce stretching and black holes. However, the existing databases such as IETR, IRCCyN, and IVY have 3D-generated views with these artifacts. This observation suggests that the existing 3D view synthesis quality assessment algorithms can not judge the quality of most recent 3D synthesized views. With this view, through this abstract, we analyze the need for a new large-scale database and a new perceptual quality metric oriented for 3D views using a test dataset

    Target tracking using a mean-shift occlusion aware particle filter

    Get PDF
    Most of the sequential importance resampling tracking algorithms use arbitrarily high number of particles to achieve better performance, with consequently huge computational costs. This article aims to address the problem of occlusion which arises in visual tracking, using fewer number of particles. To this extent, the mean-shift algorithm is incorporated in the probabilistic filtering framework which allows the smaller particle set to maintain multiple modes of the state probability density function. Occlusion is detected based on correlation coefficient between the reference target and the candidate at filtered location. If occlusion is detected, the transition model for particles is switched to a random walk model which enables gradual outward spread of particles in a larger area. This enhances the probability of recapturing the target post-occlusion, even when it has changed its normal course of motion while being occluded. The likelihood model of the target is built using the combination of both color distribution model and edge orientation histogram features, which represent the target appearance and the target structure, respectively. The algorithm is evaluated on three benchmark computer vision datasets: OTB100, V OT18 and TrackingNet. The performance is compared with fourteen state-of-the-art tracking algorithms. From the quantitative and qualitative results, it is observed that the proposed scheme works in real-time and also performs significantly better than state-of-the-arts for sequences involving challenges of occlusion and fast motions
    corecore